Semi-automatic Refinement of the JMdict/EDICT Japanese-English Dictionary
نویسندگان
چکیده
The JMdict/EDICT Japanese-English Dictionary is a freely-available dictionary distributed in XML (JMdict)and text (EDICT) formats. It is widely used as a source of lexical material in dictionary systems and text-processing projects. We propose two refinements to make the dictionary more computationally tractable: marking entries where the English is not a translation equivalent and expanding contracted entries. We then propose and apply semi-automatic methods to refine existing entries. The resulting dictionary is shown to be more suitable for the construction of machine translation rules.
منابع مشابه
Enhancing a Dictionary for Transfer Rule Acquisition
The JMdict/EDICT Japanese-English Dictionary is a freely-available dictionary distributed in XML (JMdict)and text (EDICT) formats. It is widely used as a source of lexical material in dictionary systems and text-processing projects. We propose two refinements to make the dictionary more computationally tractable: marking entries where the English is not a translation equivalent and expanding co...
متن کاملTerm Selection Term Selection Query - language Term Translation Doc - language Term Selection Term Weighting Term Matching Term Weighting Term Matching
This paper presents results for the Japanese/English cross-language information retrieval task on the NACSIS Test Collection. Two automatic dictionary-based query translation techniques were tried with four variants of the queries. The results indicate that longer queries outperform the required description-only queries and that use of the rst translation in the edict dictionary is comparable w...
متن کاملJMdict: A Japanese-Multilingual Dictionary
The JMdict project has at its aim the compilation of a multilingual lexical database with Japanese as the pivot language. Using an XML structure designed to cater for a mix of languages and a rich set of lexicographic information, it has reached a size of approximately 100,000 entries, with most entries having translations in English, French and German. The compilation involves information re-u...
متن کاملWord Usage Examples in an Electronic Dictionary
This paper describes a project in which the Tanaka corpus of matched Japanese-English sentence pairs has been linked to the WWWJDIC online Japanese-English dictionary. The process of linking the corpus is described in detail, as well as an analysis of the word coverage, and the editing of the corpus to remove some of the errors it contains. The paper concludes that the Tanaka corpus can success...
متن کاملAutomatic Integrated Dictionary Systems
0. Introduction 1. AidTrans Project 2. I.D.S. Japanese Reading Course 3. Multiple-path Predictive Analysis 4. Sentence-for-sentence Analyser 5. English Output 6. Other Language Pairs 7. Japanese Script I/O 8. Automatic Integrated Dictionary Compiler 9. Automatic Integrated Dictionary 10. Man-aided Translating Machine 11. Japanese-English Teaching Machine 12. Japanese-English Scientific & Techni...
متن کامل